Distributed linear regression by averaging

نویسندگان

چکیده

Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck. paper, we study one-step iterative weighted parameter averaging in linear models under data parallelism. We do regression on each machine, send results to a central server take average of parameters. Optionally, iterate, sending back doing local ridge regressions centered at it. How does work compared full data? Here, performance loss estimation test error, confidence interval length high dimensions, where number parameters comparable training size. find averaging, also give for averaging. that different affected differently by distributed framework. Estimation error increases lot, while prediction much less. rely recent from random matrix theory, develop new calculus deterministic equivalents as tool broader interest.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Model Averaging for Linear Regression Models

We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models (i.e., combinations of predictors) when making inferenc...

متن کامل

Faster Linear Iterations for Distributed Averaging

Abstract: Distributed averaging problems are a subclass of distributed consensus problems, which have received substantial attention from several research communities. Although many of the proposed algorithms are linear iterations, they vary both in structure and state dimension. In this paper, we investigate the performance benefits of adding extra states to distributed averaging iterations. W...

متن کامل

Optimal Finite-Time Distributed Linear Averaging

A new optimal finite-time distributed linear averaging (OFTDLA) problem is presented in this paper. This problem is motivated from the distributed averaging problem which arises in the context of distributed algorithms in computer science and coordination of groups of autonomous agents in engineering. The aim of the OFTDLA problem is to compute the average of the initial values in finite-time s...

متن کامل

Fast linear iterations for distributed averaging

We consider the problem of !nding a linear iteration that yields distributed averaging consensus over a network, i.e., that asymptotically computes the average of some initial values given at the nodes. When the iteration is assumed symmetric, the problem of !nding the fastest converging linear iteration can be cast as a semide!nite program, and therefore e"ciently and globally solved. These op...

متن کامل

Sequential Model Averaging for High Dimensional Linear Regression Models

In high dimensional data analysis, we propose a sequential model averaging (SMA) method to make accurate and stable predictions. Specifically, we introduce a hybrid approach that combines a sequential screening process with a model averaging algorithm, where the weight of each model is determined by its Bayesian information (BIC) score (Schwarz, 1978; Chen and Chen, 2008). The sequential techni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Annals of Statistics

سال: 2021

ISSN: ['0090-5364', '2168-8966']

DOI: https://doi.org/10.1214/20-aos1984